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Abstract. We demonstrate how the stochastic global optimization scheme of 
Simulated Annealing can be used to evaluate optimum parameters in the problem 
of DNA breathing dynamics. The breathing dynamics is followed in accordance 
with the stochastic Gillespie scheme with the denaturation zones in double stranded 
DNA studied as a single molecule time series. Simulated Annealing is used to find 
the optimum value of the activation energy for which the equilibrium bubble size 
distribution matches with a given value. It is demonstrated that the method overcomes 
even large noise in the input surrogate data. 
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1. Introduction 

Optimization techniques have been successfully used across disciplines [TJ EJ [3j @J El El [7j . 
In general, a given problem is cast in a manner such that finding the extremum point 
of a functional in some search space renders the desired solution. For illustration, 
consider the non-trivial problem of finding the global minimum in a rugged potential 
energy surface. In this problem one starts from any arbitrary point, and then moves 
on the search space following certain criteria (one could be that a move is accepted if 
the gradient norm decreases in a given move) and converge on a point for which the 
gradient norm is zero. To verify whether the obtained point is indeed a minimum, or 
not, one needs to check if the eigenvalues of the associated Hessian matrix at that point 
are all positive, or not. In general, for any problem, in which a finite number of optimum 
values of parameters are sought after, one can write down an adequate functional and 
extremize it by following the above procedure. 

However an optimization procedure whose mode of action is solely guided by 
minimizing the local gradient norm (one which is deterministic) may face difficulties, 
especially in search spaces with multiple minima. If the obtained minimum is a local 
one, there is no way of escaping from it again and moving towards the global minimum. 
Here the concept of true global optimizers, whose search is not solely driven by a 
local gradient, or one which is generically stochastic in nature, is needed. Simulated 
Annealing (SA) is such a global optimizer and has been very successfully used since it was 
originally introduced by Kirkpatrick et al. [8] . In particular, in these references SA was 
used to tackle the famed traveling salesman problem (a so-called NP-nondeterministic 
polynomial time problem). Due to its successful application a large amount of literature 
on SA has been published across fields, see, for instance, Refs. [HI QUI EEH fT2l [13] . 

SA is a search technique which borrows its concepts from the physical process of 
annealing. If one is trying to produce a good alloy, a molten state of the metallic 
mixtures is prepared at a very high temperature (starting annealing temperature T at ) 
and then gradually cooled. If the rate of cooling (annealing schedule) is slow enough 
the system will solidify at the minimum (stable) thermodynamic state. SA exactly 
follows this principle. The search space is sampled initially at a high temperature (T at ), 
and then gradually at lower values of T at determined by a preassigned schedule. The 
temperature T at controls the strength of thermal fluctuations, so that even if the system 
is trapped in a local minimum, a high enough T at can take it out of the attractive basin 
and the search can carry on to locate deeper minima, and eventually the global one. In 
the limit of very small T at or if the search is carried on for a sufficient length of time, 
the global minimum will be unequivocally found. Operational details of this procedure 
which will be presented below. 

In this communication we employ SA to study the dynamics of DNA denaturation 
bubbles. We show that SA finds the optimum value of the activation energy for base 
pair breaking and therefore reproduces correctly the bubble free energy landscape. 
Moreover we demonstrate that SA performs well even in the presence of strong noise 



Finding the optimum activation energy ... Chaudhury et al. 



3 




Figure 1. DNA denaturation bubble. The bubble consists of single-stranded DNA 
that is embedded in still intact double-strand. At the interface between bubble and 
double-strand base pairs break or reanneal stochastically, giving rise to the bubble 
dynamics. 

superimposed on the surrogate input data. In the optimization procedure we start with 
an arbitrary value of the activation energy and allow SA to improve it until the bubble 
size distribution matches a given distribution. We specifically use SA in this problem 
as we believe it is well suited to analyze experimental data on DNA bubble dynamics. 

The article is organized as follows. In the following we first discuss the methodology 
of SA in detail and connect it to the optimization of the activation energy required for 
DNA base pair opening. Then the simulation results are presented and discussed; and 
finally the article is concluded. 

2. DNA bubble dynamics and Simulated Annealing 

Double-stranded DNA can locally denature, i.e., the base pairs usually forming the 
double- helix break [14] . The result is a DNA bubble consisting of single-stranded DNA, 
see Fig. [TJ Driven by thermal fluctuations this bubble changes its size by breaking or 
reannealing of base pairs at the two interfaces between bubble and intact double-strand 
[14"] . The resulting DNA breathing, or DNA bubble dynamics therefore is a stochastic 
process [T5], [161 El- To follow the time series of the kinetics one needs to resort to a 
formulation that intrinsically includes the fluctuations in an efficient way. It was shown 
in Ref. [TEj that the Gillespie algorithm [18J, in which the multi-step rate equation is 
written down as a single master equation, provides time series and equilibrium quantities 
of bubble breathing dynamics. 

A denaturation bubble opens (nucleates) with an initiation free energy factor 
do = exp(— E s /kBT) ~ 10 5 . . . 10~ 3 at room temperature [19]. Once initiated, 
additional base pairs break with a free energy E(T), with associated statistical weight 
u = exp(— _E(T)//c#T). This free energy E(T) is composed of the free energies for 
hydrogen bonding and base stacking [20]. In general the value of E(T) will depend on 
the type of nearest neighbor base pairs (AT or GC) [20], however, here we consider 
a homopolymer DNA in which all E(T) are equal. Thermal melting of the DNA 
double-strand defining the melting temperature T m occurs when E{T m ) = 0. Below 
the melting temperature the fully closed (no bubble) state will always be favorable. 
However fluctuations may initiate a transient bubble. The ensuing opening/closing 
dynamics of a single DNA bubble can in fact be monitored in fluorescence correlation 
experiments [21]. At equilibrium the probability distribution to find a bubble of size 



Finding the optimum activation energy ... Chaudhury et at 



O 




P eq (m) □ 

T at = 17 

T at = 47 

Tat = 67 
_^ T at =100 

•:\ - 

-. 



■ 







8 10 12 14 16 18 



m 

Figure 2. Plot of the equilibrium probability P eq (m) to find a bubble of m broken 
base pairs for a bubble domain of maximum size m = 18. The parameters for the open 
boxes are <tq = 1CP 3 and u — 0.6. We also show the SA-estimates i 3 T at ( m ) a t various 
annealing temperatures T at . The plot for T at = 17 matches quite well with the preset 
surrogate equilibrium distribution, P eq (m). 



m broken base pairs is plotted in Fig. [2] with logarithmic ordinate. The initial jump 
corresponds to the initiation free energy factor a . Subsequently one observes a straight 
decay corresponding to the investment of the free energy E(T) per broken base pair. 

In this work we wish to employ a stochastic optimization search for the correct 
determination of u at the particular temperature in which the system is studied. For 
this we have a theoretical probability distribution profile for the occupation at various 
DNA sites P2] with which we will compare our distribution profiles for various u values 
determined for different annealing temperatures given by the annealing protocol. SA will 
guide our search for the optimum u such that we arrive at the theoretical distribution 
profile in as few iterations as possible. 

In the SA algorithm thermal fluctuations are used to cross barriers in the underlying 
search landscape such that optimum parameters can be found even in a rugged surface. 
The simulation is started at a sufficiently high annealing temperature T at . This 
makes nearly all moves acceptable as the criterion for accepting or rejecting a move 
is determined by the Metropolis criterion. In our case the objective function (often 
called cost function in the SA language) which is being minimized is the sum of the 
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squares of the difference of the occupation probabilities at the various sites 



m 



cost = ^(P eg (m)-P Tat (m)) 2 



(1) 



where P eq (m) denotes the preset surrogate probability distribution to have m broken 
base pairs, and PT at ( m ) is the distribution obtained from SA at temperature T at . If, on 
going from one step to the other in the annealing protocol defined below, the magnitude 
of the cost function decreases, this move is accepted. If it increases, the move is not 
discarded immediately. We subject it to the Metropolis test [22]: The quantity 



is computed where costj is the magnitude of the cost in the present step and costj_i is 
the value at the step before. As we observe an increase in the cost function, the quantity 
A is positive. The probability of accepting this move is then determined by evaluating 
the function 



By definition F ranges between and 1. For each evaluation of F we invoke a random 
number Rn between and 1. If F is greater than Rn we accept the move, following the 
idea that the cost increase corresponding to the "Boltzmann factor" F is smaller than 
the intensity of the fluctuation corresponding to the random number Rn. If F < Rn the 
move is rejected. Thus at high T at , F will be close to 1 and most moves will be accepted, 
and a greater region of search space will be sampled. As the simulation proceeds, T at is 
lowered by following the annealing schedule, as the optimization proceeds. This decrease 
in T at can be a simple decrease by a certain factor or an exponential decrease depending 
on the nature of the problem. Once on the right path towards the global minimum we 
need not search the entire space and concentrate on a small region which will funnel the 
search specifically to the global minimum. As T at is lowered less and less moves pass 
the Metropolis test. More and more of the accepted moves correspond to a decrease in 
the cost function and we move towards the global minimum. In the current setup it is 
the value of the base pair breaking free energy factor u for which F is minimized. 

3. Results and Discussion 

We focus on the evolution of a few quantities important to characterize the denaturation 
bubbles, as we move through the annealing schedule. The optimum search for u was 
started with the cost function far away from zero, the starting value being close to 0.1; 
the starting guess value of u chosen here is arbitrary, which could easily have been some 
other value far from the actual solution. In our simulation the annealing schedule was a 
2% decrease in T at , i.e., starting with an initial annealing temperature of T at = 100 after 
each simulation step (or temperature step) T at was decreased by 2% of its current value. 
For each temperature step T at in the SA run, a maximum of 30 samplings were carried 
out, however, in cases when 20 steps were successful from the Metropolis sampling, the 



A = COStj — COStj_! 



(2) 




(3) 
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simulation was started at the next lower annealing temperature obtained following the 
annealing schedule. In Fig. [3] we plot the profile of u against the inverse of the annealing 
temperature T at . The convergence is rapid towards the correct value u = 0.6, reaching 
u = 0.5997 after about 10 iterations. The simulation is carried on for more steps to 
ensure that the obtained value corresponds to a final plateau. 

During sampling for moves which pass the Metropolis test, the instantaneous values 
of u can show fluctuations at a particular annealing temperature (as one samples the 
system a number of times at a given annealing temperature) and more so when the 
annealing temperature is high. This fluctuations in u will decrease as we move closer 
to the solution which is achieved at a lower annealing temperature. This is because the 
Metropolis sampling scheme is such that lesser and lesser number of moves are accepted 
as the annealing temperature goes on decreasing. The histogram in the inset of Fig. [3] 
depicts this fluctuations in u for the entire range of annealing temperature in the SA 
sampling performed. Initially, the u values show higher fluctuations as these correspond 
to sampling at a higher annealing temperature, but these fluctuations gradually decrease 
and near the vicinity of the solution u values other than those close to the actual 
solution are not accepted. Hence, the maximum normalized count in the inset of Fig. [3] 
correspond to u = 0.6, which is the actual solution. 

In Fig. [2] we plot the evolution of the bubble size distribution Pj- at (m) for some 
different annealing temperatures T at . It can be seen that at higher T at finer details of the 
population of the free energy landscape, corresponding to the occupation of large bubble 
states (high rn) are not sampled, and consequently the probability distribution falls off 
too quickly. At successively lower temperatures the profile found from SA converges 
towards the true equilibrium distribution of the bubble. While the curve for T at = 100 
is far away from the equilibrium data, the curve for T at = 17 exactly superimposes 
on the equilibrium profile. The other two curves for T at = 67 and T at = 47 show the 
intermediate dynamics and approach towards the equilibrium profile, respectively. This 
is done so that we can follow the subtle features of the dynamics and exactly track 
whether larger m-th sites get populated or not. The observation is that as we proceed 
a significant population develops at larger m values with the decay from higher m to 
lower m becoming more gradual. This is expected physically since rare events need a 
finer sampling resolution (note the logarithmic axis in Fig. [2]). 

To test the robustness of the SA procedure, we also carried out calculations to see 
how a very noisy distribution profile, created by introducing reasonable perturbations 
onto the equilibrium distribution data (P eq (m) ± A x rn, A ~ 5, rn e [0, 1]), converges 
to the correct one. The results of the calculation are shown in Fig. HI The SA run 
was started with the zigzag profile (solid line) denoted as noisy data. The starting 
temperature was kept pretty large (at nearly 10 6 ), since the cost function was of a large 
value to start with. In this case the cost function can be written as 



which looks similar to Eq.(l), with the only interpretational change being that we are 



m 




(4) 



8=1 
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Figure 3. Plot of the SA estimate of the statistical weight factor u versus the 
annealing temperature 1/T at . Inset: Histogram showing the Metropolis sampling 
of instantaneous fluctuations in u for the entire range of annealing temperature 
performed during the optimization process. The main figure plots the best it at a 
given temperature step obtained from the fluctuating data shown in the histogram. 



trying to find out the correct distribution, keeping the activation energy u = 0.6 in 
all the cases. The goal of the SA run is to smoothen the noisy profile and reach the 
optimum distribution (given by symbols) for u = 0.6. Gradually with the progress of the 
calculation at successive steps (10, 50, 200 and 600) the profile becomes more regular, 
with the noisy structures vanishing and at the 600-th step a profile which matches the 
exact theoretical one is generated as is evident from the Fig. HJ 

4. Conclusion 

Potential application of the SA algorithm presented in this work will be analyzing 
experimental single molecule data. However, except in one case there is no available 
experimental data. The only experimental data that is available come from fluorescence 
correlation measurement [21J. However, it is only a question of time until new 
experimental data become available. Considering this the present communication tries 
to establish a ground work which can be further extended in analyzing experimental 
data in future. 
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Figure 4. (color online) Plot of log PT at (m) against m at various SA steps. The plot 
for 600 steps matches the theoretical distribution. 



Another potential application of SA would be the subject of recent publication [23] 
in which a novel setup for obtaining more accurate data for DNA denaturation has been 
proposed. In particular, this setup has the potential to reveal position-resolved data. 
Having these potential experiments in mind we wanted to find out the potential of SA 
to actually tackle the involved analysis. In the submitted work we show that this is 
actually possible. 

As we show in the present work, the SA algorithm is extremely insensitive to noisy 
input data; even after adding random values of significant amplitude to the input data 
the correct parameters are found. This was not necessarily expected a priori, and this 
is what we consider the major result of our work. 

In the recent work we have shown how optimum parameters in systems of biological 
interest can be found out with the use of stochastic optimization techniques. These 
results have inspired us to look into more complicated problems, like a two dimensional 
model in which not only the size of the bubble but the exact position of cleavage of base 
pairs can be found out [T7]. In addition to that, the experimental work involving single 
molecule techniques so far concentrates on homopolymer DNA. It will be interesting to 
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see what power SA exhibits when longer and heteropolymer structures are tested. Work 
in this direction is in progress, which we wish to address in our future communications. 
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